-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance and misc improvements #365
Conversation
{ | ||
if (clause.Data?.Count == null || clause.Data?.Count == 0) | ||
{ | ||
yield return new Violation(string.Format(Strings.Get("Err_ClauseNoData"), rule.Name, clause.Label ?? rule.Clauses.IndexOf(clause).ToString(CultureInfo.InvariantCulture)), rule, clause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, how is yield helping here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The api you implement is an enumerable so
- You can Fail fast when validating on the first error if you don't care what they all are
- The function doesn't have to keep a list of the found errors.
You must implement this method as an IEnumerable to inherit from OATOperation.
@@ -0,0 +1,20 @@ | |||
// Copyright (C) Microsoft. All rights reserved. Licensed under the MIT License. | |||
|
|||
using Microsoft.CST.OAT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OatSubstringIIndex generic enough to move into OAT: https://github.com/microsoft/OAT/tree/main/OAT/Operations?, or did I get that wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few issues.
It's not much different than the contains operator that already exists and is much more broad in terms of the data types it can work on.
Secondly, built-in operations in OAT should not be for one specific data type (for example this just works for strings).
The main distinction is that we are capturing and returning the index of the match, a behavior that is not mirrored in any other OAT operation, and doesn't make contextual sense outside of simple string matches.
See the existing AI code and AI uses it's own regex instead of the OAT regex because OAT doesn't deal with Boundary objects, that's a AI concept.
* First Commit 1.3 Beta (#344) * Add an AsyncEnumerable version of getting results. * Fix * Add rewritten parallel implementation * Use the new methods * Fix metadata in HTML report * Adds a progress bar * Use the data from the metadata object for the progress bar * Fixes * Fix not incrementing number of total files. * clean up * Fix end of line finding for comment checking. * Simplify pack rules * Refactor FileChecksPassed * Don't serialize unneeded values. * Fix test case * Adds the GetTags command * Remove tests that test removed functionality * Build Fixes * Nullability fixes * Fix tests * Fix GetTags and tests * Fix MetaData.cs * Fix test * Fix #342 Adds per file timeout * Adds timeout to gettags * typo * Improve progress bar * Add Metadata for files scanned and time taken to scan * Add ScanState field for analyze metadata * Fix * Add file timings to get tags command. * Rewrite Metadata and Metadata helper to simplify collection. * Misc Cleanup * Fixes * Update GetTagsCommand.cs * Spruce up progress bars. * Nicer progress information * Add ETA to progress * More progress bar improvements. * use built in eta * Simplify PopulateRecords * Disable parallel extraction * Update version.json * Catch overflow exceptions * Use GetTags instead of Analyze command for TagDiff and TagTest * Update core-pipeline.yml * Remove Unused UniqueTagsExceptions * Simplify Skip logic * Clean up * Fix Exclusions bug * Fix binary file exclusions * Dont open browser * Remove browser open * only chomp 1024 bytes * Fix logging * Change timeout to milliseconds * Fix binary file detection * Only check 1024 characters for control characters * Fix binary checking * Update MetaDataHelper.cs * fix html tests * Update Utils.cs * Improve binary checking * Remove tag output only option from analyze Use the get tags command. * Code Cleanup * improve some variable names * Remove Simple Tags Tests * Bump dependencies. * Simplify JsonWriter * Remove extraneous header on text results * Clean up * Simplify last updated * Remove unused lastupdated references * Fix #343 * Update GetTagsCommand.cs * Clean up Dependencies * Save access and create times * Fix printing to console over progress bar. Console output is saved until after the progress bar completes. * Show file counts in progress bar * Fix cancelling. * Remove unused Dummy Writers * Gfs/cli timeout (#349) * Add overall processing time out for GetTags and Analyze commands * Separate state for timed out skipped * Update FileRecord.cs * Fix binary checking (#351) * Fix binary checking * Fix Binary Checking in GetTags * Update GetTagsCommand.cs * Update AnalyzeCommand.cs * Add async (#354) * Add Async Versions of GetResult appropriate for WASM use * Fix GetTags command uniqueness * Update RuleProcessor.cs * Build fix * Fix #353 * Add async tests * Fixes enumeration printing (#357) * Fix enumeration count of entries. * Build fix * Dont list meaningless info when running get tags progress bar * Update RuleProcessor.cs * Options for Skipping gathering excerpts and skipping unknown files (#361) * Add option to grab number of lines of context and disable gathering on GetTags for performance * Use concurrent que in stead of bag for performance. * Default skip unknown files * build fix * Bump dependencies * Publish Beta Builds from Development * Gfs/some tests (#363) * Fix Unknown files being scanned by all language rules. * New tests and fixes for unknown file type scans * Update AnalyzeJsonWriter.cs * Write Faster GetLastIndex * Slightly faster again * Make fast IndexOf method * Fix * Fix test bug * Test fix * Minor performance improvements. (#364) * Performance and misc improvements (#365) * Keep track of column for matches properly * Recfactor try catch * Fix excertp gathering for async analyze * Rethrow instead of clobbering * Use substring when possible for performance Seeing significant performance uplift. * Fix exception getting version info * Fix OatSubstringIndexOperation * Update AnalyzeBenchmark.cs * Fix rule verifier * Support case insensitive string and substring operations * Enable verify rules test * Fix default rules verification to actually check embedded rules. * Clean up isbetween a bit * clean up * Remove unneeded test * Test fix * Add OAT validation to rule validator * Fix same-line findings * Add a rule verification for the within conditions. * Fix rule verifier * Fix storage rules * Add an exporting progress bar. (#367) * Add an exporting progress bar. * Fix gettags command to return actual exit code. * Cache Results of IsCommented (#369) * Improve Scope Match performance * Fix * Update TextContainer.cs * Test fix * Update TestAnalyzeCmd.cs * Update TextContainer.cs * Update TextContainer.cs * Update TextContainer.cs * Update TextContainer.cs * Clean up text container * Update RuleProcessor.cs * Use Globs for file exclusions * Update AnalyzeCommand.cs * Update CLICmdOptions.cs * Add none to disable * Update GetTagsCommand.cs * Fix build * Fix Text Contains Respect parallel in rule processor Update descriptions for command options Reduce sleep frequency * Fix filter tests * Update TestGetTagsCmd.cs * Repro of null rules in match * Remove TagTest GetTags seems to perform the same task. * Remove TagTest command * Limit parallelization to decrease timeouts * Fix test * Fix regexword implementation * Clean up rules Improve some rules, remove some unneeded fields. * Fix Rules and RulePacker Fix Regex Word behavior * Fix Pack Rules * Narrow media regexes * Fix tests * Fix test * Improve TagDiff performane * Fix async analyze * Update TestAnalyzeCmd.cs * Update TestAnalyzeCmd.cs * Update TestAnalyzeCmd.cs * Workaround for IndexOf on Windows * Update OatSubstringIndexOperation.cs * Remove multithread enumerating * Update TextContainer.cs * Update RuleProcessor.cs * Don't precheck matches count. * Respect numcontextlines * Update AnalyzeBenchmark.cs * Add Multi path option * Update TestGetTagsCmd.cs * Fix verifier * simplify regex word construction * Better simplify * Update TestGetTagsCmd.cs * Update test numbers to match fixed behavior of regex-word * Update Ruleset.cs * Fix test numbers with fixed regex word * Update TagDiffCommand.cs * Make Get-Tags an option of Analyze GetTags and Analyze were mostly duplicative so instead the GetTags behavior is now provided by giving `-t` or `TagsOnly` to Analyze. * bump to RC * fixes * Rename tests to accurately reflect using analyze command * Support multiple input for TagDiff via comma separated * Remove test for removed functionality * Add missing comments * More comments. * Make FilePathExclusions parsed automatically. * Roslynator Changes * Respond to comments. * Fix tests * Fix tests * More Roslynator Changes * Improve Exclusion Speed (#374) * Shrink Icon Fix #330 * Skip files earlier * Update AnalyzeCommand.cs * fix bad merge * Update version.json * Fix linebreaks to show finished progressbars
Switch to using indexof instead of regex when possible. Showing significant performance uplift.
Fix #362.